Wp-dyna: Planning and Reinforcement Learning in Well-plannable Environments

نویسنده

ISTVÁN SZITA

چکیده

Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over an extended period of time. Finding an optimal policy in RL may be very slow. To speed up learning, one often used solution is the integration of planning, for example, Sutton’s Dyna algorithm, or various other methods using macro-actions. Here we suggest to separate plannable, i.e., close to deterministic parts of the world, and focus planning efforts in this domain. A novel reinforcement learning method called WP-Dyna is proposed here. WP-Dyna builds a simple model, which is used to search for macro actions. The simplicity of the model makes planning computationally inexpensive. It is shown that WP-Dyna finds an optimal policy, and that plannable macro actions found by WP-Dyna are near-optimal. In turn, it is unnecessary to try large numbers of macro actions, which enables fast learning. The utility of WP-Dyna is demonstrated by computer simulations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Searching for Plannable Domains can Speed up Reinforcement Learning

متن کامل

Integrated Architectures for Learning , Planning , and ReactingBased

This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned model of the world. In this paper, I present and show results for two Dyna archi...

متن کامل

Integrated Modeling and Control Based on Reinforcement Learning

This is a summary of results with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures,...

متن کامل

Planning with neural networks and reinforcement learning

planning with neural networks, time limits of discounted reinforcement learning Planning, taskability, Dyna-PI architectures Dyna-PI architectures: focussing, forward and backward planning, acting and (re)planning. Tested with... Ideas from problem solving and

متن کامل

Reinforcement Learning with a Hierarchy of Abstract Models

Reinforcement learning (RL) algorithms have traditionally been thought of as trial and error learning methods that use actual control experience to incrementally improve a control policy. Sutton's DYNA architecture demonstrated that RL algorithms can work as well using simulated experience from an environment model, and that the resulting computation was similar to doing one-step lookahead plan...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Wp-dyna: Planning and Reinforcement Learning in Well-plannable Environments

نویسنده

چکیده

منابع مشابه

Searching for Plannable Domains can Speed up Reinforcement Learning

Integrated Architectures for Learning , Planning , and ReactingBased

Integrated Modeling and Control Based on Reinforcement Learning

Planning with neural networks and reinforcement learning

Reinforcement Learning with a Hierarchy of Abstract Models

عنوان ژورنال:

اشتراک گذاری